AITopics | chi-squared distribution

Collaborating Authors

chi-squared distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Privacy Amplification Persists under Unlimited Synthetic Data Release

Pierquin, Clément, Bellet, Aurélien, Tommasi, Marc, Boussard, Matthieu

arXiv.org Machine LearningFeb-6-2026

We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al. (2025) established the first formal amplification guarantees for a linear generator, but they apply only in asymptotic regimes where the model dimension far exceeds the number of released synthetic records, limiting their practical relevance. In this work, we show a surprising result: under a bounded-parameter assumption, privacy amplification persists even when releasing an unbounded number of synthetic records, thereby improving upon the bounds of Pierquin et al. (2025). Our analysis provides structural insights that may guide the development of tighter privacy guarantees for more complex release mechanisms.

artificial intelligence, fisher information, machine learning, (11 more...)

arXiv.org Machine Learning

2602.04895

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty

Inatsu, Yu

arXiv.org Machine LearningApr-4-2025

Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called robustness measures. However, GP-UCB-based methods for robustness measures include a trade-off parameter $\beta$, which must be excessively large to achieve theoretical validity, just like the original GP-UCB. In this study, we propose a new method called randomized robustness measure GP-UCB (RRGP-UCB), which samples the trade-off parameter $\beta$ from a probability distribution based on a chi-squared distribution and avoids explicitly specifying $\beta$. The expected value of $\beta$ is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected value of regret based on the optimal solution and estimated solutions. Finally, we demonstrate the usefulness of the proposed method through numerical experiments.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

2504.03172

Country:

North America > United States (0.04)
Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Active Learning for Level Set Estimation Using Randomized Straddle Algorithms

Inatsu, Yu, Takeno, Shion, Kutsukake, Kentaro, Takeuchi, Ichiro

arXiv.org Machine LearningAug-6-2024

Level set estimation (LSE), the problem of identifying the set of input points where a function takes value above (or below) a given threshold, is important in practical applications. When the function is expensive-to-evaluate and black-box, the \textit{straddle} algorithm, which is a representative heuristic for LSE based on Gaussian process models, and its extensions having theoretical guarantees have been developed. However, many of existing methods include a confidence parameter $\beta^{1/2}_t$ that must be specified by the user, and methods that choose $\beta^{1/2}_t$ heuristically do not provide theoretical guarantees. In contrast, theoretically guaranteed values of $\beta^{1/2}_t$ need to be increased depending on the number of iterations and candidate points, and are conservative and not good for practical performance. In this study, we propose a novel method, the \textit{randomized straddle} algorithm, in which $\beta_t$ in the straddle algorithm is replaced by a random sample from the chi-squared distribution with two degrees of freedom. The confidence parameter in the proposed method has the advantages of not needing adjustment, not depending on the number of iterations and candidate points, and not being conservative. Furthermore, we show that the proposed method has theoretical guarantees that depend on the sample complexity and the number of iterations. Finally, we confirm the usefulness of the proposed method through numerical experiments using synthetic and real data.

algorithm, black-box function, chi-squared distribution, (16 more...)

arXiv.org Machine Learning

2408.03144

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Towards Exact Computation of Inductive Bias

Boopathy, Akhilan, Yue, William, Hwang, Jaedong, Iyer, Abhiram, Fiete, Ila

arXiv.org Machine LearningJun-22-2024

Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

hypothesis, hypothesis space, inductive bias, (14 more...)

arXiv.org Machine Learning

2406.15941

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Romania (0.04)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Bounding Reconstruction Attack Success of Adversaries Without Data Priors

Ziller, Alexander, Riess, Anneliese, Schwethelm, Kristian, Mueller, Tamara T., Rueckert, Daniel, Kaissis, Georgios

arXiv.org Artificial IntelligenceFeb-20-2024

Reconstruction attacks on machine learning (ML) models pose a strong risk of leakage of sensitive data. In specific contexts, an adversary can (almost) perfectly reconstruct training data samples from a trained model using the model's gradients. When training ML models with differential privacy (DP), formal upper bounds on the success of such reconstruction attacks can be provided. So far, these bounds have been formulated under worst-case assumptions that might not hold high realistic practicality. In this work, we provide formal upper bounds on reconstruction success under realistic adversarial settings against ML models trained with DP and support these bounds with empirical results. With this, we show that in realistic scenarios, (a) the expected reconstruction success can be bounded appropriately in different contexts and by different metrics, which (b) allows for a more educated choice of a privacy parameter.

adversary, mse, reconstruction attack, (15 more...)

arXiv.org Artificial Intelligence

2402.12861

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Probability Distributions To Be Aware Of For Data Science (With Code)

#artificialintelligenceMay-20-2022, 05:29:00 GMT

Probability and statistics knowledge is at the core of data science and machine learning; You'll require both statistics and probability knowledge to effectively gather, review, analyze and communicate with data. This means it's essential for you to have a good grasp of some fundamental terminologies, what they mean, and how to identify them. One such term you'll hear thrown around a lot is'distribution.' All this is in reference to is the properties of the data. There's several instances of phenomena in the real world that are considered to be statistical in nature (i.e. This means there are several instances in which we've been able to develop methodologies that help us model nature through mathematical functions that can describe the characteristics of the data.

normal distribution, probability, probability distribution, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)

Add feedback

Statistics (III) ANOVA in Data Science & Machine Learning

#artificialintelligenceFeb-15-2022, 21:40:06 GMT

For the last part of the Statistics series, we will cover the ANOVA, Post-hoc Pairwise Comparison, Two-way ANOVA, and R-squared. Previously, our study focused on one or two groups of subjects. How can we handle the concept of multiple groups with multiple factors? For example, the dose level and gender may impact the effectiveness of a vaccine. How can we determine whether it is statistically significant for particular combinations?

anova, compute, statistics, (13 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area (0.55)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Add feedback

Characterizing Deep Gaussian Processes via Nonlinear Recurrence Systems

Tong, Anh, Choi, Jaesik

arXiv.org Machine LearningOct-20-2020

Recent advances in Deep Gaussian Processes (DGPs) show the potential to have more expressive representation than that of traditional Gaussian Processes (GPs). However, there exists a pathology of deep Gaussian processes that their learning capacities reduce significantly when the number of layers increases. In this paper, we present a new analysis in DGPs by studying its corresponding nonlinear dynamic systems to explain the issue. Existing work reports the pathology for the squared exponential kernel function. We extend our investigation to four types of common stationary kernel functions. The recurrence relations between layers are analytically derived, providing a tighter bound and the rate of convergence of the dynamic systems. We demonstrate our finding with a number of experimental results.

artificial intelligence, machine learning, recurrence relation, (18 more...)

arXiv.org Machine Learning

2010.09301

Country: Asia > South Korea > Ulsan > Ulsan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.73)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

Famous Probability Distributions in Data Science

#artificialintelligenceOct-17-2020, 17:50:11 GMT

Data Scientists are modern-day statisticians that take a shot on complex business problems and unravel them with the assistance of data. Probability Distributions allow a Data Scientist or Data Analyst to recognize patterns in any case totally random variables. A normal distribution is generally described as the bell-shaped curve and it depicts the recurrence of something that you are evaluating, such as the class scores. The focal point of the bend is the mean and the curve width called the standard deviation. The score happens most every now and again is the mean.

artificial intelligence, normal distribution, probability distribution, (16 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Add feedback

Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Kato, Masahiro, Kawarazaki, Hikaru

arXiv.org Machine LearningNov-2-2019

We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this condition, the model might lack robustness with regards to the distribution shift. The proposed method would enable us to reject a misspecified model under our definition. By applying the proposed method, we can obtain a model that predicts the label for the unlabeled test data well without losing the interpretability of the model. In experiments, we show how the proposed method works for synthetic and real-world datasets.

hypothesis, test data, train data, (15 more...)

arXiv.org Machine Learning

1911.00688

Country:

North America > United States > Tennessee (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback